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Abstract 

Estimating the innovation probability density is an important issue 
in any regression analysis. This paper focuses on functional autore- 
gressive models. A residual-based kernel estimator is proposed for the 
innovation density. Asymptotic properties of this estimator depend on 
the average prediction error of the functional autoregressive function. 
Sufficient conditions are studied to provide strong uniform consistency 
and asymptotic normality of the kernel density estimator. 

Key words: kernel density estimation - nonparametric residuals - functional 
autoregressive models - martingale approach - multivariate central limit the- 
orem 

2000 Mathematics Subject Classification: 62G07 - 62G08 - 62G20 

1 Introduction 



Dealing with regression estimation procedure gives rise to important ques- 
tions concerning the a posteriori diagnostic of model assumptions. Diagnos- 
tic tools are generally based on the residuals. For example, one may have to 
check if the innovations are Gaussian ones. This is required in the context of 
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variable selection or model change detection among others. Checking such 
an assumption may involve estimating the innovation density and investi- 
gating the asymptotic convergence properties of the estimate. Kernel-based 
methods are among the most common nonparametric methods used to that 
purpose. Since the pioneer works of Rosenblatt |22] and Parzen [19] , a wide 
range of literature is available on kernel density estimation. We refer the 
reader to [TU], [TT], [21] for some interesting books on density estimation in 
the context of the independent and identically distributed sample, mixing 
processes, etc. However, few papers investigate the asymptotic properties of 
a kernel density estimator (KDE for short) associated with the driven noise 
in a given regression or autoregressive model. 

When dealing with such models, the driven noise is not observed. Its 
probability density function (pdf for short) can only be estimated through the 
residual error calculated from the estimation of the unknown component of 
the model. This one shall thus be estimated with an appropriate convergence 
rate to induce good properties to the residual error. A common noise density 
estimator is the Parzen-Rosenblatt kernel estimator, based on this residual 
error which is then considered as a noise predictor. 

Chai et al. [6] proved the uniform strong consistency on M of the noise 
KDE in the linear regression case. 

The linear parametric autoregressive case is for example studied in Koul 
[Hj who gave weak convergence results, or Cheng [8j who also showed that the 
asymptotic distribution of the maximum of a suitably normalized deviation of 
the density estimator from the expectation of the kernel error density (based 
on the true error) is the same as in the case of the one sample set up, which is 
given in Bickel and Rosenblatt [1] . In the nonlinear parametric autoregressive 
framework, Liebscher [16j obtained almost sure uniform convergence of the 
KDE on compact sets and asymptotic normality results. Convergence rates 
are improved by Miiller et al. [L7\ with the use of weighted kernel density 
estimators. Moreover, Cheng extended in [9] his results of [8] in the nonlinear 
case. A goodness of fit test of the errors was also derived in Lee and Na [15] 
and Bachmann and Dette [2]. Conditions on the stationarity of the time- 
series are given in all these references. 

The nonparametric framework has been poorly addressed up to now. It 
only concerns the regression case. It was first studied by Ahmad |lj in a 
fixed design regression model. He proved pointwise and uniform almost sure 
convergence of the noise KDE, but without providing convergence rates. In 
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a more general regression setting, Cheng [7] gave sufficient conditions under 
which the density estimator based on nonparametric residuals is consistent. 
One of these conditions is that the estimation error of the nonlinear regression 
function has to be uniformly weakly consistent. Efromovich [13] pointed out 
that the nonparametric framework for error density estimation is "extremely 
complicated due to its indirect nature". He made developments under the 
customary assumption that the regression function is differentiable and the 
error density is twice differentiable. More recently, Plancade [20] proposed 
a density estimator constructed by model selection and applied it in the 
nonparametric regression framework. 

In this paper we are interested in estimating the error density function of 
a functional autoregressive models of order 1. This framework combines the 
difficulties encountered both in the nonparametric regression setting and in 
the autoregressive setting. Models have the following general form 

Xn = f{Xn-l) + £n {u G N), (1.1) 

where Xn G is observed, the function / of in is unknown and 
e = {£n)n>o is the driven noise with zero mean, positive definite covariance 
matrix F and unknown probability density function p. The initial state Xq 
is given and is independent of e. 

Since the white noise {en)n>i is not observed, we have to construct a 
predictor sequence (£n)n>i- If / was known, in = Xn — /(X„_i) would be a 
good predictor of However, since / is unknown, we have to estimate it in 
such a way that the residual e„ = X„ — /„_i(X„_i) is a "good" predictor of 
where /„ is an estimator of /. The case of functional autoregressive models 
provides an upper difficulty for the analysis of residuals. The objective of 
the present paper is to propose a residual-based recursive kernel estimator for 
the innovation density in that case, and to study its asymptotic properties. 

To estimate the unknown pdf p we use a recursive version of the well- 
known Parzen-Rosenblatt kernel-based density estimator: for any y G W^, 
we estimate p{y) by 

i=l 

where i^' is a kernel function and the bandwidth parameter a is a real number 
in ]0, l/d[. The choice of a recursive estimator was favored to allow the use of 



Density Estimation in Functional Autoregressive models 



4 



martingale techniques in exploring the asymptotic properties of p„. Recursive 
estimators have also the advantage of not requiring the stationarity of 
from the initial instant. 

Under adapted regularity conditions on the density function p, the link 
between the estimation error of p„ and the errors of may be formulated 
as follows: for all y G M'', 

\Pn{y) - p{y)\ = o(^^ £ "^^^^^ " ^™") + ""^^^ 

That is, the estimation error of pn will always depend on the average predic- 
tion error of whose convergence to 0, ie. 

n— 1 

-Y.\\kx.)-f{X,)\\=o{l) a.s. (1.3) 

is the major difficulty in proving the convergence of pn to p. It is clear 
that since the process is not bounded, this last result requires strong 
convergence results on /„. The main difficulty is then to find an estimator 
of / that meets this requirement. Since no structural assumption is set on 
/, we choose to use a recursive version of the well-known Nadaraya- Watson 
kernel estimator [HI [25] , studied for example by Senoussi [23] , see also Duflo 
[T2] . Proving (11. 3p with this estimator will be the first step to achieve before 
studying the asymptotic properties of p„. 

The paper is organized as follows. The framework and the assumptions 
are presented in Section 2, together with the properties they induce on model 
(11.11) . Section 3 is dedicated to the study of the nonparametric kernel estima- 
tor of /: strong consistency and conditions for achieving (II. 3p . Asymptotic 
properties of the KDE pn (11.21) are studied in Section 4: uniform strong 
consistency and central limit theorem (CLT for short). Proofs of the main 
results are postponed in appendix. 

2 Model assumptions and properties 

The following set of assumptions is common when dealing with autoregressive 
functional models (Dufio ^12j). 
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Assumption [A1]. Function f is continuous and there are two positive 
constants rf < 1 and Cf such that for any x G W^, 

\\fix)\\ < r;||a;||+C/. (2.1) 

Assumption [A2]. The initial state Xq and e = {£n)n>o have a finite mo- 
ment of order m > 2. 

These assumptions will ensure good stability properties of the process (X„)„>o. 
In particular, since by [A2] the noise has a finite moment of order m > 2, 
then el^ := sup ||£:j|| = a (ra^''"^) a.s. and we derive from Proposition 6.2.14 of 

i<n 

Dufio [12\ that almost surely 

n 

"^\\Xi\r = 0{n) and sup \\X,\\ = O [e*] = a [n^/"") (2.2) 
i=i 

These two results will be useful in the rest of the paper. 
2.1 Strengthening Assumption [A2] 

Assumption [A2] is rather standard and holds for many probability distribu- 
tions. However it may be interesting to restrict studies to particular subfam- 
ihes of noises depending on their tail distribution. It is particularly useful 
to get more precise properties, as for example better convergence results or 
better asymptotic bounds. Restrictions to noises with a finite exponential 
moment and to Gaussian noises are presented in this paragraph. 

Assumption [A2bis]. There is m > such that E [exp (m||Ao||)] < oo and 
E [exp (m||£:i||)] < oo. 

Conditions of Proposition 6.2.15 of fT2] are verified with (12.1 p and [A2bis], 
which implies that, for any a < m, almost surely 

n 

exp(a|| Aj||) = 0{n) and sup ||Aj|| = o (logn) . (2.3) 

. -, i<n 

In the same spirit of [A2bis] , we shall be interested on what happens with 
Gaussian white noises. This is the subject of the next Proposition, which is 
an adaptation of Proposition 6.2.15 of [12] . 
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Proposition 2.1 Consider Model ( fi.ij) where is a Gaussian white noise 
with invertihle covariance matrix T. This implies that there ism < l/2Amin (r) 
sttc/i t/iat E [exp(m||£:i 11^)] < oo. ^sswme t/iat E [exp(||Xo||^/2Amin (F))] < oo. 
Assume also that f is continuous and that there is Cf g]0 , 1[ such that 

liminf (c/||xf - ||/(x)f) > -log(E [exp(m||ei|n]) . (2.4) 

||a;||^oo ?Ti(l — Cf) 

Then, 

sup \\Xi\\ = n) a.s. (2.5) 

i<n V ^ 

and for any a < (1 — c/)/2Amin (T), 

n 

^exp(a||Xif) = 0{n) a.s. (2.6) 



i=l 



Proof : Under assumption fl2.4p . for some finite constants M > and 
6 > 0, if ||x|| > M, we liave 

||/(x)f < CfWxr - log (E [exp(m||5if )]) - 6 (2.7) 

In addition as / is continuous, then sup||2,j|<j^^ 11/(3^)11 < oo. Let us set Z„ = 
exp (m(l — c/)||X„|p) and Fn = o {^^-.^x-, ■ ■ ■ -.^n)- For any < c < 1 and 
any x, y G M'^, we have + y|p < ||x|p/c + (1 — c). Thus, 

E[Z„+i|J-„] < E[exp(m||5if)] exp(^^^^^i-^||/(X„)ir) (2.8) 

and using (12. 7p . we derive that for some positive constants c and C, 

E I J-„,] < e-' Zn + C (2.9) 

Finally, applying Proposition 6.2.12 of p!2] with the Lyapounov function 
V{x) = exp (m(l — cj) ||a;p), we obtain (12. 5p and (12. 6p for any a < m{l ~Cf) 
and any m < l/2Amin (F). This closes the proof of Proposition 12.11 □ 



Remark 2.2 When / satisfies [Al], we can find c/ G]r/ , 1[ such that as- 
sumption (12.41) is fulfilled. Indeed (12. ip implies that ||/(a;)p < + 
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C|/(l — Tf). Hence, when / satisfies [Al] and (£„) is Gaussian, then fl2.5p 
and fl2.6p holds for any a < (1 — r/)/2Amin (r). In particular, we have 

n 

^exp((l-rj)||X,f/2A„in(r)) = 0{n) a.s. (2.10) 
1=1 

This stabihty property will be useful when dealing with Gaussian noises. 



2.2 Asymptotic stationarity and properties 

A main consequence of this framework is a property of stationarity. Indeed, 
with [Al] and by assuming that the distribution of {e„) has a probability 
density function p > 0, the process X = (X„)„>q is asymptotically stationary 
and possesses an invariant distribution /i which has a finite moment of order 
m and a probability density function denoted h, which satisfies for any x G 

h{x) = I p{x- f{t))h{t)dt (2.11) 

Moreover, the following property holds: for any /i-integrable function g : 
M'^ — )• M which satisfies ||(y'(a;)|| < C(||x|p + 1) (where C is a constant), the 
strong law of large numbers states that 

-J2g{X,) ^ / g{x)df^{x). (2.12) 

Besides, for a positive constant R, we have 

^ n—l _ 71—1 

i=0 1=0 

Thus, applying (I2.12p yields that, for R > J^^ ||/(a^)|| h{x)dx, 

^ n— 1 
1=0 

which means that the process (X„)„>o infinitely often crosses the ball of ra- 
dius R centered on 0. This last property will be useful for proving convergence 
results of /„ over dilated sets. 



Density Estimation in Functional Autoregressive models 



8 



3 Properties of the kernel estimator fn 

We shall now introduce the estimator of /. Since no structural assumption 
was set on function /, we chose a recursive nonparametric estimator, follow- 
ing the well-known Nadaraya- Watson estimator. Let K he a kernel and /3 a 
real number in ]0, l/d[. Then, for any x G Mf^, we estimate f{x) by 

fix) = T:zl^'K{^iX.-x))X.,. 

' Y.Zl^^'K{iP{x,-x)) ^ ■ ' 

if the denominator of ( 13. ip is not equal to 0, and by 0, otherwise. As 
it has a recursive form which allows the use of martingale properties. For 
notation ease, is defined with the same kernel function K. It is of course 
possible to take another one, provided that it has the same characteristics. 
We also point out that the denominator of (13.11) . when divided by n, is an 
estimate of h{x), the stationary distribution density. 

Beyond assumptions [Al] and [A2] on / and we impose the following 
properties to function / and the pdf p: 

Assumptions [A3]. Function f belongs to C^iW'-) and its first derivatives 
are bounded. 

Assumptions [A4]. The probability density function p is positive and be- 
longs to C^(M'^), and p and its first derivatives are bounded. 

Furthermore, all the proofs of the paper are based on a kernel function 
K with the characteristics given hereafter: 

Assumptions [A5]. The kernel K is a nonnegative function, Lipschitz, 
bounded with compact support, and integrates to 1. 

3.1 Strong uniform consistency of 

Under Assumptions [A1]-[A5], Duflo [12] and Senoussi [23] prove the almost 
sure pointwise convergence of to /, as well as a pointwise central limit the- 
orem and results of uniform convergence on compact sets. Since the process 
{Xn)n>o is not bounded, these convergence results are not sufficient to get 
(11.31) . The uniform convergence over dilated sets of /„ is necessary. This kind 
of convergence has already been established in the regression framework by 
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Bosq [5] under mixing conditions. For model (11. ip and when e is Gaussian, 
Duflo [12] also proves that for c sufficiently small there is s > such that 



a.s. 



sup fn{x) - f{x) =o{n 

\\x\\<c^/Togn 

In a control framework, that is, when X„ is submitted to the action of an 
exogenous variable (also called control variable), Portier and Oulidi [2lj es- 
tablish the same kind of result but with a more general noise. We will now 
adapt these convergence results to model (II. ip and also improve them to 
study the prediction errors, which has never been done before. For the se- 
quel, let us denote 



inf |p(2;) ; ||2|| < Vn + 



where (f, 



nJn>0 



is a sequence of positive real numbers increasing to infinity 
and i? is a constant greater than f^^ ||/(a^)|| h{x)dx. 

An additional assumption on the probability density function p must be 
introduced to study the denominator of /„. 

Assumption [A6]. There is a sequence of positive real numbers {vn)n>i 
increasing to infinity such that f„ = O {n^) with z/ > and 



inf < a 



where s e](l + /3d) /2 , 1[ and /3 e]0 , l/d[. 

Let us mention that [A6] is not required to establish pointwise or uniform 
on compact sets convergence results for /„. Besides, when (3 < l/((i + 2), 
then [A6] reduces to = o{n^). Indeed, in that case, we can find s G 
](1 + ,S(i)/2, 1[ such that (3 = 1-3. 

Theorem 3.1 Let (3 g]0, l/2d[. Assume that [A1]-[A6] hold. Then, we have 



sup 

l|a:||<t;„ 



fn{x) - f{x) 



n 



A-l 



nir. 



O 



n 



a.s. 



(3.2) 



for any \ e\ \ + Pd , 1[. In particular, for any [3 < l/2{d+ 1), 



sup 

\\x\\<Vn 







fn{x) - f{x) 




\mn J 



a.s. 



(3.3) 
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Proof : The proof is postponed in Appendix B. □ 



The bound given in ( I3.3P shows that the convergence rate of /„ over 
dilated sets strongly depends on the decrease of the density function p and 
on the choice of a well-suited sequence In particular, the way the pdf p 
decreases at infinity has to be known to choose an appropriate sequence (vn)- 
For example, in the Gaussian case, it is easy to see that for any c g]0 , 1[, 

m„ > const, exp {-v^ / 2cXmin (X)) (3.4) 

Therefore, choosing Vn = A (log log n)^^^ with A > 0, we obtain that = 
0{{\ogn)^ /2cAmin{r)^^ Assumption [A6] is then satisfied and we derive from 
(Q that for 13 = l/2{d + 1) and any A e]0 , l/2{d + 



sup 

||3::||^<Aloglogn 



fn{x) - f{x) 



a n 



a.s. 



(3.5) 



3.2 Average prediction error of fn 

As mentioned in introduction, we are in fact interested in the asymptotic 
average prediction error of /„. It is studied in the next Corollary. 

Corollary 3.2 Assume that [Al]-[A6] hold. Assume also that the sequence 
{nin) is decreasing. Then, for any (3 < l/2[d+ 1), 

'-±U(X,)-nXM-0('^)+0(^^^±^) a.s. (3.6) 

\nin J \ n J 

where wi^n = ^2^=1^1'"^ andw2,n = Ym=i'^^^^ '^i^' if the sequence (n^/^f"™) 
is decreasing and W2,n = n^^'^ J2^=i '^i^ otherwise. 

Moreover, if [A2his] holds instead of [A 2], then wi^n = Yl^=i'^i^^P {—avi) for 
any a < m and W2,n = X]r=i(^°S*) ^xp (— at>j) if the sequence ((logn)e~"'^") 
is decreasing and W2,n = (logn) Yl^=i ^^P otherwise. 

Proof : For any ra, let us denote by vr^ the prediction error defined 
by Tin = fn{,Xn) — f{Xn). To establish (13.61) . we consider the following 
decomposition 

^ n 1 " 1 " 
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^ n 1 " 

-J2M1{\\x.\\<.,} < -J2 (3.8) 

n ^ ^ ^ \\x\\<v, 

1=1 1=1 " "— * 

Now, using fl3.3p and the fact that (m„) is assumed to be decreasing, we 
derive that 



||^^(X,) - /(X,)||l{||xdl<..} = 0{n-^m-') a.s. (3.9) 
On the other hand, thanks to (IB.OP in Appendix B, we infer that 

n n 



1=1 i=l 



The bound of f l3.10p completely depends on the moment assumption on (e„). 
Indeed, when (£„) satisfies [A2], we have ej^ = o(n^/™') and we know by fl2.2p 
that X]r=i ~ 0{n) a.s. Therefore, by Lemma lA.ll with Z„, = ||X„|| 

and g{z) = z"^, we derive that 

n n 

Y,im\+Cf,K)liiix.ii>..} = 0{Y,vl-'^) a.s. (3.11) 

i=l i=l 

which defines the term wi^n- In addition, if the sequence ) is de- 

creasing, according to Lemma lA.ll we have 



n 



Y'lM\\x.\\>..} = 0{Y^'^-vr) a.s. (3.12) 



i=l 1=1 



which defines the term W2,n- If the sequence ) is not decreasing, 

we obtain 

n n n 

i=l 1=1 i=l 

which defines the second form of W2,n- Hence, combining the previous results, 
we obtain 

^ E 11^(^0 - /(^.)l|l{||x,||>M = O f + a.s. (3.13) 

t=i ^ ' 
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and the first part of Corollary l3.2l is establislied combining fl3.9p and f l3.13p . 
Now, if lias a finite exponential moment of order m, tlien e| = o(Iogn) 
a.s. and we know by (12. 3 p tliat for any a < m, exp (a||Xj||) = 0{n) 

a.s. Tlierefore, proceeding in tlie same manner, we find using Lemma lA.ll 
tfiat for any a < m, Wi^n = ^"=1 ^« e~"'"^ and W2,n = Sr=i(^°S^)^~"^' 
sequence ((log?7,)e~"''") is decreasing and W2,n = logn ^"^j^ e~°''' otlierwise. 
Tliis acliieves tfie proof of Corollary l3.2[ □ 

Corollary l3.2l does not state that the average prediction error of /„ con- 
verges to 0, see (11. 3p . Obtaining such a result depends on the decrease at 
infinity of p and the existence of a well-suited sequence (vn)- The moment 
assumption on gives a first information on the possible choices of (wn). 
As (vn) increases to infinity, we always have wi^n/n = o(l). Therefore, the 
sequence must be chosen in such a way that W2,n/n = o{l), which de- 
pends on the moment assumption. Moreover, dealing with the term /nin 
requires to know the way p is decreasing. The sequence (vn) has thus to be 
selected from a compromise between these two conditions. We give below 
three examples that well illustrate this compromise. 

Example 1 Assume that the decrease at infinity of p is of the form C||a;||~'' 
with C > and S > 3. Then, has a finite polynomial moment of order m 
with m g]2, 5 — 1[. Let us choose the sequence under the form Vn = AnP 
for some positive constants A and r] < (5/5. It follows that 

m-^ = O {{vn + RY) = O (n-^") 

and [A6] holds since rj < P/6. In addition, as m > 2, we have wi^n = 
o{n) and as soon as > 1/m^, we also have W2,n = o{n). Of course, the 
condition 1/m^ < rj < P /6 with /3 < l/2{d + 1) implies that S must be 
sufficiently large to ensure that < l/2{d + 1). Thus, for such 6 and 

any (3 G](5/m^, l/2{d+ 1)[, the prediction errors satisfy (II. 3p and, since 
?7</3/5<l/m, we have wi^n = 0{w2,n) and therefore 

- J2 WMX.) - fiX^)\\ = O + O (n-^+i/-) a.s. 

i=l 

The best rate of convergence is obtained by taking rj = (/3 + l/m)/(m + 5) 
and we find that for any (3 G](5/m^ , l/2{d + 1)[, 

1 " - 

-J2mX.)-f{X.)\\=0{n~^^) a.s. (3.14) 
1=1 
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with r = [m — 6 /mP) / {m + 6) . □ 

Example 2 Assume the decrease at infinity of p is of the form C exp (— ) 
with C > and 6 > 0. Thus, (e^) has a finite exponential moment of order 
m < 6. Let us choose the sequence under the form f„ = rj log n with 
1] > 0. Then, wi^n + ui2,n = O {n^~'^^ logn) with a < m, = O (^n^"^) 
and [A6] holds as soon a.s rj < P/6. In that case, the prediction errors satisfy 
f ll.3p . The best bound is obtained by choosing r] = [3 /2m : for /3 < l/2((i+l), 
result fl3.14p stands for any r < 1/2. □ 

Example 3 Assume now that is Gaussian with covariance matrix F. 
Thanks to Remark 12.21 we know that X]r=i ^-^pI*^!!"^*!!^) ~ 0{n) where a = 
(1 - rj)2Amin(r), and — o (ylogn) a.s. Hence following the proof of 
Corollary l3.2[ we find that result (13. 6 p holds with wi^n = XlILi "^^ and 
W2,n = A/log n X^iLi 6""^' • Therefore, taking Vn = {i] logn)^/^ with 77 > 
leads to Wi^n + ^i'2,n = O (i/logran^""^) = o(r;,). In addition, using (13. 4p . we 
derive that m"-*^ = 0(?t,''/^'^'^"""'^^)) for any c g]0, 1[. Then [A6] is fulfilled by 
choosing r] < 2/3cAmin(r) and the prediction errors satisfy (II. 3p . Moreover, 
choosing rj = 2/3cAmin(r)/(l+c(l— r^)) implies the better rate of convergence: 
for any f3 < l/2{d + 1) and any c g]0, 1[, result (I3.14p stands for r = c(l — 
r;)/(l + c(l-r^)<l/2.n 

4 Asymptotic properties of the KDE 

This section is concerned with the asymptotic properties of the KDE ^5"„. 
More precisely, we present the uniform strong consistency with rate for p„ as 
well as a pointwise and multivariate central limit theorem. 

4.1 Strong consistency 

The following Theorem 14.11 gives the conditions that ensure the uniform 
strong consistency of Pn- 

Theorem 4.1 Assume that [Al]-[A5] hold true. If there is a sequence (f„) 
of the form t>„ = An'^ with A > and rj > 1/m? such that the sequence {mn) 
is decreasing and satisfies = o{n^) where /3 < l/2(c?+ 1), then almost 
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surely supy^^d \pn{y) — p{y)\ = and more precisely, 

sup \pniy) - piy)\ = o(n^-i) + 0(n-") + O f ^) + O (wn) (4.1) 

where Wn = n-'^^'+i/™ if rj < 1/m and w„ = n~''*^'"~^) otherwise. More- 
over, if [A2bis] holds instead of [A2] and if there is a sequence {vn) of the 
form Vn = rjlogn with t] > such that the sequence (m„) is decreasing and 
satisfies = o{n^) where /3 < l/2{d + 1), then Pn satisfies with 

Wn = n~"-''^(\ogn) and a <m. 

Proof : The proof is postponed in Appendix C. □ 

Theorem 14.11 directly estabhshes the convergence of the estimation error 
of p uniformly on whole M*^, that is, without first studying what happens on 
fixed compact sets nor at fixed y. The reason is that no matter what set of 
y you consider, you always have to verify the assurnptions required for the 
convergence (11. 3p of the average prediction error of /„, see Section |3l These 
assumptions are strong enough to directly obtain uniform strong consistency 
properties of Pn without additional hypothesis. 

The convergence of pn thus relies on the choice of sequence (f„) which 
depends on the way p is decreasing at infinity. Then the question is: "Which 
pdf are consistently estimated with p„ ?" We have exhibited three families 
of densities (see examples in §3.2^ : densities with polynomial or exponential 
decrease at infinity and Gaussian densities. Let us specify the different results 
for these three examples. 

Example [1] (continued) Recall that the best rate of convergence for the 
average prediction errors is obtained by choosing the sequence (v„) under the 
formf„ = An^ with A > 0, = (/3+l/m)/(m+5) and /3 e]5/m^ , l/2(ci+l)[. 
With this choice, we obtain : 

sup \pn{y) - p{y)\ = o{n<-^) + 0(n-") + O [n'^^) (4.2) 

with r = [m — 6 /mP) / {m + S) . □ 

The following remark shows how the convergence rate can be improved 
and constraint on 6 relaxed in the polynomial case, by considering an another 
KDE of p. 
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Remark 4.2 The main difficulty in proving the consistency of pn comes 
from the study of the prediction errors fn{Xn) — f{Xn), and in particular 

from the term ^"^^ \\fi{Xi) - f {Xi)\\t{\\x,\\>v,} (see Corollary [22]) • This term 
is studied using a crude upper bound involving e*. In the framework of [A2], 
this upper bound led us to introduce additional and restrictive constraints 
on 7] and /3 for proving that W2,n = o{n). One way to avoid the study of such 
a term is to adapt the plug-in estimator pn with truncated residuals e*^ = 
Xn — /n-i(-^n-i)l{||x„_i||<i.„_i} leading to the following truncated version of 
the kernel density estimator: 

P'niy) = ^°"i^(^"(X.-^_i(X,_i)l|,|;,,_,|,<.,_,}-y)) (4.3) 

1=1 

The sequence (f„) still has to satisfy assumption [A6] to provide the uniform 
almost sure convergence over dilated sets of /„. In the context of [A2], we 
choose Vn of the form Vn = An^ with A > and rj > 0. Following the proof 
of Theorem l4.H we easily show that under [Al]-[A5] 

sup my) - P{y) I = oin-^-') + 0{n-") + O f"^) + O (n-^'""^)^) a.s. 

if (vn) is such that (m„) is decreasing and = o{n^). In particular, 

in the context of ExamplelH the best rate of convergence is obtained taking 
rj = p/(m + 5-l) with /3 < l/2(ci+l). With that choice, the KDE ^ satisfies 
(14.21) with T = (m — l)/(m + 5 — 1), which yields a better convergence rate 
than that of p„ since {m — l)/{m + 6 — 1) » (m — 5 /m(5) / {m + 5). This 
result has moreover been obtained by only assuming that 5 > 3. 
Of course, the main drawback of this KDE based on truncated residuals is 
its use in practice, as the sequence {vn) shall of course be suitably chosen. 



Example [2] (continued) When the decrease of p is exponential, the KDE 
Pn satisfies (14.21) for any r < 1/2. □ 

Example [3] (continued) In the Gaussian case, the KDE pn satisfies (14.21) 
with r = c(l — r/)/(l + c(l — rj) for any c e]0, 1[. □ 
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4.2 Central limit theorem 

In this section, we present a pointwise and multivariate central limit theorem 
for p„. 

Theorem 4.3 Assume that [A1]-[A5] hold. Assume also there is a sequence 
{vn) of the form f„ = Ari^ with A > Q and rj e]l/m^ , 1/m^ + l/m{d + 2)[ 
such that the sequence (m„) is decreasing and satisfies 

^(l-ad-2/3)/2 ^-1 ^ (4 4) 

for some a g](1 - 2{mr] - l/m))/d, l/d[ and (3 g]0, 1/2(c? + 1)[. 
Then, for any y eW^, 

Zn{y) = V^^iUy) - Piy)) ^ ^ f o, MkM") = z{y) (4.5) 

n^oo y 1 + ad J 

u,kere ml = / KM)it. In ad«„n. lor , MsUnct ^.Us y,,- ■ ■ ,y, of 

W^, we also have 

(Z„(yi), ■ ■ ■ , Zr,{y,)) ^ (Ziy,), Ziy,)) (4.6) 

n— >oo 

where Z{yi), - ■ ■ , Z{yq) are independent. 

Moreover, if [A2bis] holds instead of [A 2], the KDEpn satisfies the pointwise 
and multivariate CLT if there is a sequence (vn) of the form Vn = f] \ogn with 
T) e]0 , l/2a[ and a < m, thus yielding that the condition Iji4-4\ l fulfilled for 
some a g](1 — 2ari)/d , l/d[. 

Proof : The proof is postponed in Appendix C. □ 

In the following we show that the three families of densities, for which we 
proved the consistency of p„, also benefit from the CLT. 

Example [T] (continued) We know that = 0{n^^). Hence, condition 
(14.41) is fulfilled as soon as ad > 1 — 2(/3 — rjS) with rj < (3/6. The two 
constraints on the bandwidth parameter a are the same if we take rj = 
{13 + l/m)/(m + 6) with /3 e]6/m'^ , l/2(d + 1)|. Therefore, the KDE p^ 
satisfies the pointwise and multivariate CLT for any a > (1 — 2(3T)/d with 
T = {m — S/ml3)/{m + 6). 
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In this context, the rate of convergence can be improved using the truncated 
KDE p^. Indeed, from Remark 14.21 it is easy to see that satisfies the 
CLT under the conditions (g^D and n''^-»d)/2^-im-i)r, ^ ^(^^y ^^^^^^ taking 

T] = l3/{m + 5-1), with /3 < l/2{d + 1), implies that p*^ satisfies the CLT 
for any a > (1 — 2/3r)/(i with r = {m — l)/{m + 6 — 1). The pointwise 
convergence rate of is better than that of pn since {m — l)/{m + 6 — 1) > 
> (m — 6/ml3)/{m + 6). In addition, constraints on S and (3 are really 
relaxed since we only have to assume that 6 > 3 instead of assuming that 5 
is sufficiently large to ensure that S < □ 

Example [2] (continued) The KDE pn satisfies the CLT for any a > (1 — 
2l3T)/d with /3 e]0 , l/2(c; + 1) [ and r < 1/2. □ 

Example [3] (continued) In the Gaussian case, the KDE pn satisfies the 
CLT for any a > (1 - /3r)/d with r = (1 - rj)/(l - + 1/c) and c e]0, 1[. 
The best rate of convergence in the CLT is obtained when / is bounded. 
Indeed in this case, Vf = and r = c/(l + c) with c e]0, 1[. Therefore the 
constant r involving the condition on the bandwidth parameter a, can be 
set as close to 1/2 as possible. This allows the choice of the smaller a value, 
thus yielding to the best convergence rate that can obtained for the KDE. 
When / is not bounded and Vf close to 1, the convergence rate obtained in 
the Gaussian case is far from this "best" rate. □ 

Remark 4.4 In the three previous examples, we have seen that the band- 
width parameter a involving the convergence rate in the CLT, must satisfy 
the condition a > (1 - 2/3r)/(i with r < 1/2 and (3 < l/2{d + 1). Therefore, 
since (1 — 2/3r) / d » l/{d + 2), we clearly have a big loss on the convergence 
rate in comparison with the sample model (ie. / = 0) or for example with 
the AR(1) model (ie. f{x) = Ox). Indeed, in these two cases, we can estab- 
lish the same CLT for a kernel density estimator of p under the condition 

5 Conclusion 

The strong consistency with rate as well as the pointwise and multivariate 
asymptotic normality have been established for a kernel estimator pn of the 
noise density in a functional autoregressive model. This estimator is based on 
a predictor sequence of the noise constructed from a nonparametric estimator 
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/„ of the autoregression function. The properties of the KDE depend on the 
convergence of the average prediction error of /„, which hampers the study of 
Pn and the clarifying of its convergence rate. Indeed, it requires the knowledge 
of the way the density decreases at infinity as well as the choice of a well- 
suited sequence which controls the convergence of the prediction errors of /„. 
We have at least exhibited three families of densities which are consistently 
estimated with p„ and benefit from the central limit theorem. Nevertheless 
from a practical point of view, the convergence results ensure that the KDE 
Pn may behave pretty well in many situations. 



Appendix A 

This appendix is devoted to two technical lemmas useful in the different 
proofs of the paper. 

Lemma A.l Let (Zn) be a sequence of positive real random variables and 
let (vn) be a sequence of positive real numbers increasing to infinity. Assume 
that X]r=i di^i) ~ 0{n) a.s. for some increasing positive function g : M. ^ M.. 
Let b>0. Then, if the sequence {y^/ g{vn)) is decreasing, 

n n fj 

i=i i=i ^' 

Moreover, if {an) is a sequence of positive real numbers increasing to infinity 
such that the sequence (a„/(7(f„)) is decreasing, then 



n n 

j=i i=i ^' 



Proof : The proof is very standard and is therefore omitted here. The 
proof of the first part can be found in [21] for example. The proof of the 
second part is a direct adaptation of the first one. □ 

For all the sequel, let us denote by Tn the cx-algebra of the events occurring 
up to time n, ie. Tn = o'(Xo, ei, • ■ ■ , En)- 
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Lemma A. 2 In the context of model ( f i. ij) . assume that [Al]-[A4] hold true. 
Let {Un)n>o be a sequence of random vectors adapted to the filtration {J^n)n>o- 
For any x G and n > 1, let us define 

n 

G„(x) = 5^ (i^ {i\X, - f/,_i - x)) - i^'^'^pix + f/,_i - /(X,_i))) 

i=l 

where K is a kernel satisfying [A5], b g]0, 1[ and a g]0, l/d[. Then, for any 
constants A > and u > 0, 

sup \GrXx)\ = oin') + 0{n'+''''"^-'') a.s. (A.l) 

II X \\<An'' 

where s e]{l + 2b-ad) /2, 1 [. 

In addition, ifJ2'i=i ~/(-^t-i)ll = o{n) a.s. and if a > l/{d + 2), then 
for any x G M"', 

^-{i+2b-ad)/2 Q^^^^ ^ j^(0^^l + 2b~ady^\\K\\lp{x)) =G{x) (A.2) 
and for two distinct points x, y ofW^, 

^-(l+26-ad)/2 ( Gn{x) ) ( ^(x) \ 

V Gn{y) J n^a^ \ G{y) J 

where G{x) and G{y) are independent. 

Proof : For any y G M°', let us denote Ki{y) = K{i°-y) and let us decompose 
Gn{x) under the form Gn{x) = M„(x) + Rn{x) where 

n 

M^{x) = J2''{^dX^-U,.,-x)-E[K,{X,-a,^,-x)\J^,_^]^ 

i=l 
n 

i=l 

For any x G M'^, (M„(x)) is a square integrable real martingale. Its increasing 
process (<M(a;)>„) is defined by 

n 

1=1 

n 

-J2{i'E [K, (X, - - x) I (A.4) 

i=l 
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First of all, for all x G M'^, set AM„(x) = M„(a;) - Mn-i{x). Since the kernel 
K is bounded and Lipschitz, for all 6 g]0 , 1[, one can find some positive 
constant Cs such that, for any x, y G 

\K{x)-K{y)\<Cs\\x-y\\'. (A.5) 

Now, following the same development and using the same argues as for the 
proof of Lemma A. 2 in Bercu and Portier [3], there are positive constants 
Ci,C2,C3 and C4 such that for all n > 1, |AM„(0)| < Cin!' and < M(0) >„< 
^^^i+2b-ad ^ and for any a;, ?/ G M'^ and any 6 g]0 , 1[, 

|AM„(x)-AM„(y)| < c4x - y\\' n'+-' 
<M{x)-M{y)>n < C4||x - 

Finally, as the power 5 can be chosen as small as one wishes, all the four con- 
ditions of Theorem 6.4.34 of [12] are fulfilled. Consequently, for any constants 
A > and z/ > 0, 

sup \Mn{x)\ = {n^) a.s. (A. 6) 

II X \\<An^ 

for any s e]{l + 2h - ad) /2, 1 [. 

Let us now study the remainder Rn{x). For any i> 1, we have 

E [K, (X, - f/,_i - x) I = / K{i\v + /(X,_i) - f/,_i - x))p{v)dv, 

and via the change of variables t = + /(Xj_i) — — x), we deduce 
that 

n „ 

i?„(x) = Vz^-"'^ / K{t){p{z--t + x + U,^,-f{X,_^))-p{x + U,^i-f{X, 

Then, since by [A4] the gradient of p is bounded, we deduce by a Taylor 
expansion that 

sup \Rn{x)\ = O (ni+^-"^-'^) a.s. (A.7) 
which, combined with OA.Gp ends the proof of (lA.ip . 
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Let us now establish (1A.2I) . From ( 1A.7I1 . we immediately deduce that 



as soon as a > + 2). Therefore, it remains to establish that 

^_(i+2b-ad)/2 A Ar(0,(l + 26-arf)-iir||2p(x)) (A.9) 

n— s>oo 

In order to make use of the CLT for martingales (see e.g. [T2j, Corollary 
2.1.10, p. 46), we have to study the asymptotic behaviour of <M(x) >„ and 
to check that Lindeberg's condition is satisfied. 

Starting from (IA.4p . we can rewrite <M(x)>„ under the form 

n 

<M{x) >n= A4x)+p{x)\\K\\lJ2^^'""'' + o {n'^^'-'"') (A.IO) 

i=l 

where 

n „ 

A„(x) = / K\t){p{i-''t + x + Ui^i-f{X,^i))-p{x)))dt 

i=i 

Using one more time the fact that the gradient of p is bounded, we deduce 
that 



1=1 



Finally, since n °''^\l + 2b — ad)J2'^=ii'^'' ""^ — > 1 and we have assumed 

n— >oo 

that Y17=i ll^i-i ~ f{^i-i)\\ = o{n), we obtain that 



<M(x)>„ ^ {l + 2b-ady'\\K\\lp{x) (A.ll) 

n— >oo 

which defines the variance in the CLT. The Lindeberg's condition is easily 
obtained following the proof of Lemma B.l in [3]. 
To establish f lA.Sp . it is enough to prove 
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which, following the same lines as in [3], consists in showing that 

n 

lim n-(i+26-ad) \^E[AMi(a;)AMi(y)|J-,_i] = a.s. (A.13) 



i=l 

For alH > 1, we have 



p(r"t + x + f/,_i-/(x,_i)) dt. 

Therefore, as the gradient of p is bounded and Yl'i=i ll^^j-i II = o(n), 

we derive that 

n 

^E[AM,(x)AM,(y)|J-,_i] < g„(x,y)+0(ni+"-a^) ^ ^^^l+2b-ad^^ 

i=l 

n „ 

where Q„(x, Z/) = V ^^''""'^^(x) / K{t)K{t + i^x - y))dt. 

However, using the fact that K is compactly supported, we deduce that for 
i large enough, the integral term in Qn{x,y) is zero, which yields (lA.lSp . □ 



Appendix B 

This appendix is concerned with the proof of Theorem l3.1[ As already men- 
tioned in paragraph 2.3, the uniform convergence over dilated sets of /„ has 
already been studied by Portier and Oulidi [21] in a controlled framework, 
ie. for model of the form X„+i = /(X„) + Un + Sn+i- We follow the same 
steps of the proof, adapt and a little bit improve the results in the context 
of model (II. ip . Starting from the definition of fn{x), we can write 

fnix)~fix) = ^^^^^^^^^liH^.^i.^m - /(^)1{H„_.(.)=0} (B.l) 
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n-1 



With M„(x) = ^z^^ir(^^(X,-a;))£,+i 
1=1 

n—l 

Rn-i{x) = Y,z^'K{z%X,-x)){f{X,)-f{x)) 

i=l 
n-1 

Hn-i{x) = 5^z^^K(^^(X,-x)) 

1=1 ^ 

• Although the process (X„) is not ruled by the same equation, M„(x) is 
treated as in |21]. So, we only give the result, that is, for any A < oo and 
z/ > 0, and for all s > 1/2 + with (3 < l/2d, 

sup ||M„(x)|| =• o{n'). (B.2) 

||x|j< An" 

• Let us now study -R„. Since by [A3] the gradient of / is bounded and kernel 
K has compact support, there is a positive constant c such that K{i^{Xi — 
x))\\f{Xi) - f{x)\\ < ci-^K{i^{Xi - x)). Thus, it follows that 

\\Rn{x)\\ = ol^^t^'-^Kit^iX.-x))^ (B.3) 
= o(Gi,„(x)+^z-''p(x-/(X,_i)) 



i=l 



where G'i^„(x) stands for in Lemma lA.2l with b = I3d — P, a = P and 

Ui = 0. Now, since p is bounded and using Lemma lA.2t we obtain that, for 
any A < oo and u > 0, and for all s' > {I + (3(1 - 2(3) /2, 

sup ||^„(x)|| =• o{n'') + 0{n'-^) = ©(n^-^). (B.4) 

||x||< An" 

• The term if„_i(a;) remains to be studied. To this aim, let us write Hn{x) 
in the form G2,nix) + J2^=iP{^ ^ /(ATj-i)) where G2,n(a;) stands for Gnix) 
in Lemma lA.2l with b = (3d, a = (3 and Ui = 0. 

Let R be such that / ||/(t)||/;,(t)dt < R < oo and let (f„) be a sequence 
of positive real numbers increasing to infinity such that f„ = 0{n^) for any 
f > For a; G M'' such that ||x|| < we have 

^ n n—l 

■ ^ > '^Y.Mmx.mR} (B.5) 

i=l i=0 
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which, combined with f l2.13p . yields that 

1 " 

hminf inf Vp(a; - /(X^^i)) > a.s. (B.6) 

n->oo |ix||<i)„ nm„ ^-^ 

i=l 

In addition, using Lemma lA.2l apphed to G2,n{x) and the fact that by [A6], 
= inf (^0{n^~^),o{n'^)), we derive that for any positive constants A and 

sup ■ = 0(1) a.s. (B.7) 



Now, since 



inf 



> inf sup 



\\x\\<vn nrun \M<vnnmn'^ \\x\\<An'' nrun 

1=1 II II— 

we infer using f lB.6P together with f lB.7p . that 

hminf inf Hn-i(x) > a.s. (B.8) 

n-s>oo nmn \\x\\<Vn 

Finally, from (IB.ip . unifying the results obtained for M„(x), Rn{x) and Hn{x) 
completes the proof of the first part of Theorem l3.1[ 

The second part immediately follows by noting that when (5 < l/2{d + 1), 
we can choose A g]1/2 + f3d , 1[ such that \ = 1 — (3. 

Remark B.l From the proof of Theorem l3.1l we deduce an interesting 
upper bound of the estimation error: for all x in M°' and n > 1, 

\\fn{x)-f{x)\\<Cj,K + \\x\\+el a.s. (B.9) 

where ej^ := sup and Cf^K is a constant depending on K and /. This 

i<n 

upper bound will be useful for the study of the average prediction error of / 
in the proof of Corollary 13. 21 for example. It directly follows from the decom- 
position fIB.ip : the constant cj comes from the study of Rn-i{x) / Hn-i{x) 
and ( ]B.3p . the term from the study of Mn{x)/ Hn-i{x). 
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Appendix C 

This appendix is devoted to proving Theorem l4 . 1 1 and Theorem 14.31 Starting 
from fll.2p we infer that for all y and n > 1, 

Pn{y)-p{y) = -{Gs,n{y) + Bn{y)) (c.i) 

n 

where = + - /(X^.i)) and Gg,™ stands for 

i=l 

Gn in Lemma lA.2l with h = ad, a = a and Ui = fi{Xi). 

Proof of Theorem 14.11 • Since the gradient of p is bounded, we deduce 
that 



sup 



\Bn{y)\ = 0[J2\\kx^)-f{X,)\\] a.s. (C.2) 



which, combined with Corollary 13. 2 [ leads to 



1 /n-'^^ 

-sup = )+0K) a.s. (C.3) 



n „ciR<i \ m 



where, in the context of Assumption [A2] with a sequence of the form 
An"^ with A > and r] > 0, Wn = n"™''+^/™ if ?7 < 1/m and w„ = n"''^™"^) 
otherwise. As we assumed that m > 2, rj > Xjw? and = o(n'^), we 
clearly obtain that supyg^d \Bn{y) \ = o{n) a.s. 

In the framework of [A2bis], Corollary l3.2l used with f„ = rj log n where 
?7 > 0, leads to flC.3l) with Wn = (logn)n~"'^ for a < m, and thus also implies 
that sup^gigd \Bn{y) \ = o{n) a.s. 

• Applying Lemma lA.2l on G^^n, we obtain that for any A> and z/ > 0, 

- sup IGs.nly)! = o(n^-^) +0(n-") a.s. (C.4) 

n \\y\\<An- 

where 7 G](l + arf)/2, 1[. Then, combining (1C.3P together with (lC.4p applied 
with A = 2 and 77 = 1/2, we obtain that almost surely 

sup \pM-p{y)\ = o{n^-')+0{n-")+o('^]+0{wn) (C.5) 

\\y\\<2^i \mn/ 
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To close the proof of Theorem l4.1l let us show that 

1' 



sup \Pn{y) - p{y)\ = o 

\\y\\>2^ \ 



a.s. 



(C.6) 



Under [A2] (and so [A2bis]), we know by (12 .2^ that almost surely supj<„ = 
O {e*) = (nV'") with m > 2. Moreover, thanks to the bound flB.91) on 

fn{x)-f{x), we derive that supj<„ ||/i(-^i)|| = O {l + e* + supi<„ ||Xi||) a.s. 
Thus, it follows that 

sup \\X, - i^-i(X,_i)|| = 0(^1/™) = o (v^) a.s. 

i<n 

Hence, for n large enough, ||Xj — /j_i(Xj_i)|| < ^/n a.s. for any i < n, which 
assures that, for y such that \\y\\ > 2i/n, \\Xi — /j_i(Xj_i) — y\\ > y/n a.s. 
Therefore, since K is compactly supported, it clearly leads to 



sup 

\\y\\>2^ 



n 

1=1 



0(1) a.s. 



and 



sup \Pn{y) \ = 



a.s. 



(C.7) 



(C.8) 



In addition, since (£„) has a finite moment of order m > 2 and p is positive, 
it follows that p{y) = 0{\\y\\~^) for large values of y, leading to 



sup p{y) = 0{-). 

/Il>2v^ 



n 



(C.9) 



Consequently, ( ]C.6I) is deduced from (IC.SI) and (IC.Qp . which achieves the 
proof of Theorem 14.11 



Proof of Theorem 14.31 From the decomposition fIC.ip . we deduce that for 
any y G M°', 



(Pniv) - P{y)) = n-(i+°^)/2 GsAv) + n-^'+'^^y^B^iy) (C.IO) 
Thanks to the second part of Lemma [A.2l we derive that for any a g] , ^ [, 

n-i^+^'^y' ^ Af{0,il + ad)-'\\K\\lp{y)) (C.ll) 
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Therefore, to establish the pointwise CLT for pn, we only have to prove that 

n-(^+-'iy^ \BM\ = o{l) a.s. (C.12) 

Using ( 1C.3I1 and the fact that r] < 1/m, condition (IC.12I) reduces to (14.41) 
and, in the context of [A2], 

With a value of r] chosen in ]l/m^ , + l/m{d + 2)[, condition (IC.lSp is 

fulfilled as soon as the bandwidth parameter a satisfies ad > l — 2{mri—l/m). 
In the framework of [A2bis], condition flC.lSP is replaced by n^^^°"^y^ n""'' log n = 
0(1) with a < m. Therefore, with a value of r] chosen in ]0 , l/2a[, it is ful- 
filled as soon as ac? > 1 — 2ari, which ends the proof of the first part of 
Theorem I4.3[ 

The multivariate CLT remains to be proven. Taking the previous results into 
account, it is enough to establish that for two distinct points x, ?/ G M*^, 

V Gs^niy) J n^<^ \ ' l + ad\ p{y) J J 

This result is straightforwardly given by Lemma IA.2t which closes the proof 
of Theorem 14.31 □ 
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